<meta charset="utf-8">
<title>Homework 1 - write a README.md</title>
<meta name="description" content="Use rmarkdown to add a README to your homework repository">
<meta name="author" content="Jillian Dunic">

<!-- Enable responsive viewport -->
<meta name="viewport" content="width=device-width, initial-scale=1.0">

<!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
<!--[if lt IE 9]>
  <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

<!-- Le styles -->
<link href="http://jdunic.github.io/Intro-To-Practical-Computing-R/assets/themes/twitter/bootstrap/css/bootstrap.2.2.2.min.css" rel="stylesheet">
<link href="http://jdunic.github.io/Intro-To-Practical-Computing-R/assets/themes/twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
<link href="http://jdunic.github.io/Intro-To-Practical-Computing-R/assets/themes/twitter/css/kbroman.css" rel="stylesheet" type="text/css" media="all">

<!-- Le fav and touch icons -->

<!-- atom & rss feed -->
<link href="http://jdunic.github.io/Intro-To-Practical-Computing-Rnil" type="application/atom+xml" rel="alternate" title="Sitewide ATOM Feed">
<link href="http://jdunic.github.io/Intro-To-Practical-Computing-Rnil" type="application/rss+xml" rel="alternate" title="Sitewide RSS Feed">

WHEW! Okay, time for some plots. A much more rewarding endeavor. Plot making! Make sure that you have installed the ggplot2 package.

Basics - intro to the grammar of graphics

Here is a brief description of the basic building blocks of a creating a ggplot.

argument description of component
data as a data.frame (long format!)
aesthetic (aes) mapping variables to visualise properties - position,colour, line, type, size
geom actual visualisation of the data
scale map values to the aesthetics, colour, size, shape (show up as legends and axes)
stat statistical transformations, summaries of data (e.g., line fits, etc., )
facet splitting data across panels based on different subsets of the data

Let’s start with a basic scatterplot of life expectancy over time. You’ll notice that we are telling ggplot that we will be using the gapminder data (a data.frame!) and then telling it that we want the year on the x-axis and life expectancy on the y. After that, we need to use the + to indicate that we want to add another layer - in this case we need to add points.

# Load ggplot2
library(ggplot2)

# Load gapminder
library(gapminder)
## Warning: package 'gapminder' was built under R version 3.1.3
# Basic scatterplot
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
  geom_point()

Now let’s add some colour. Yes, 'colour' or 'color' can be used in the ggplot functions.

# We're going to colour the discrete variable continent
ggplot(data = gapminder, aes(x = year, y = lifeExp, colour = continent)) +
  geom_point()

Just as you can assign vectors, data.frames, and other R objects to a variable, you can also assign ggplots to variables.

p <-
ggplot(data = gapminder, aes(x = year, y = lifeExp, colour = continent)) +
  geom_point()

And as we’ve seen before, no plot has been produced because it has been stored as the variable p. To view our plot, we can just call that variable.

p


Layering

Here’s where we’re going to demonstrate the way that you add layers to build up a plot.

See here, that when you just call ggplot, without any geoms, nothing gets plotted! You need to also tell it to add something!

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent))

In this case, let’s add a line!

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
  geom_line()

What if we want to add more than a just a line? No problem, let ggplot know that you are going to add something else using the +. Let’s add some points.

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
  geom_line() + 
  geom_point()

Okay, that’s great, but the points don’t really stand out against the colour of the lines. We can also be more specific with our layering and aesthetics. Notice how I moved the aesthetics into the geom_line() function. You can think of aesthetics that are listed in the ggplot() function as being the 'global' settings, laying the defaults for any geoms to come later int he plot.

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) +
  geom_line(aes(colour = continent)) + 
  geom_point()

Exercise 1

Can you play with layers to create this plot? Solution

Scales

Remember, scales are what will ultimately result your axes and variables that are coded using a legend.

What is something that you notice between these two graphs?

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = log(gdpPercap))) + 
  geom_point()

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) + 
  geom_point()

If you haven’t noticed yet, let’s look back at our data.

str(gapminder)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : num  1952 1957 1962 1967 1972 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Cool! There is a difference in the way that the default colour scale operates for discrete and continuous variables!!!!!

Ok, now that we’ve figured that out we can change the colour scales for our variables. With your neighbour see if you can change the colour scales that are being used. You’ll likely need to use the scales cheatsheet section and a little bit of googling. Let me know if you guys need a hint. But I want you to take try first.

Exercise 2

Solution

Exercise 3

Try using ggsave to save one of your plots to a file! Solution


Stats

Stats summarise data. Some examples include boxplots, model fits, density plots, and bars/bins. These are very similar to using

Facetting

Other tips

1) Visualising overlapping data

Let’s return to our boring basic scatter plot.

ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
  geom_point()

Sometimes, when you have many data points that overlap, you want to have a better idea of the amount of data for a given point. I find this often comes up when you are plotting discrete variables with continous ones. There are two ways we can get a better look at the data. We can use jitter or adjust the transparency of the data.

# Using geom_jitter
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
    geom_jitter()

We could also use geom_point(position = position_jitter()) instead.

I would prefer if this plot didn’t have the points jittered so much. I can do this by changing the width and height of the jitter.

# Using geom_jitter
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
    geom_jitter(position = position_jitter(w = 0.5, h = 0.5))

Another option is to use transparency.

# Adjusting the transparency of points

ggplot(data = gapminder, aes(x = year, y = lifeExp)) + 
    geom_point(alpha = 0.5)


2) Themes

# Continuous variable examples

ggplot(data = gapminder, aes(x = lifeExp, y = log(pop))) + 
  geom_point(aes(colour = continent)) + 
  facet_wrap(~ continent)

ggplot(data = gapminder, aes(x = lifeExp, y = log(pop), colour = log(gdpPercap))) + 
  geom_point() + 
  scale_colour_gradient(low = 'blue', high = 'red')
ggplot(data = subset(gapminder, year == 2007)) +
  geom_bar(aes(x = country, fill = continent)) 

ggplot(data = gapminder, aes(x = lifeExp, y = gdpPercap)) + 
  geom_point() + 
  scale_y_log10()


ggplot(data = gapminder, aes(x = lifeExp, y = log(gdpPercap))) + 
  geom_point()

Take a look under the scales section. You’ll see that there